220 research outputs found
Efficient Construction of Probabilistic Tree Embeddings
In this paper we describe an algorithm that embeds a graph metric
on an undirected weighted graph into a distribution of tree metrics
such that for every pair , and
. Such embeddings have
proved highly useful in designing fast approximation algorithms, as many hard
problems on graphs are easy to solve on tree instances. For a graph with
vertices and edges, our algorithm runs in time with high
probability, which improves the previous upper bound of shown by
Mendel et al.\,in 2009.
The key component of our algorithm is a new approximate single-source
shortest-path algorithm, which implements the priority queue with a new data
structure, the "bucket-tree structure". The algorithm has three properties: it
only requires linear time in the number of edges in the input graph; the
computed distances have a distance preserving property; and when computing the
shortest-paths to the -nearest vertices from the source, it only requires to
visit these vertices and their edge lists. These properties are essential to
guarantee the correctness and the stated time bound.
Using this shortest-path algorithm, we show how to generate an intermediate
structure, the approximate dominance sequences of the input graph, in time, and further propose a simple yet efficient algorithm to converted
this sequence to a tree embedding in time, both with high
probability. Combining the three subroutines gives the stated time bound of the
algorithm.
Then we show that this efficient construction can facilitate some
applications. We proved that FRT trees (the generated tree embedding) are
Ramsey partitions with asymptotically tight bound, so the construction of a
series of distance oracles can be accelerated
The Geometry of Tree-Based Sorting
We study the connections between sorting and the binary search tree (BST) model, with an aim towards showing that the fields are connected more deeply than is currently appreciated. While any BST can be used to sort by inserting the keys one-by-one, this is a very limited relationship and importantly says nothing about parallel sorting. We show what we believe to be the first formal relationship between the BST model and sorting. Namely, we show that a large class of sorting algorithms, which includes mergesort, quicksort, insertion sort, and almost every instance-optimal sorting algorithm, are equivalent in cost to offline BST algorithms. Our main theoretical tool is the geometric interpretation of the BST model introduced by Demaine et al. [Demaine et al., 2009], which finds an equivalence between searches on a BST and point sets in the plane satisfying a certain property. To give an example of the utility of our approach, we introduce the log-interleave bound, a measure of the information-theoretic complexity of a permutation ?, which is within a lg lg n multiplicative factor of a known lower bound in the BST model; we also devise a parallel sorting algorithm with polylogarithmic span that sorts a permutation ? using comparisons proportional to its log-interleave bound. Our aforementioned result on sorting and offline BST algorithms can be used to show existence of an offline BST algorithm whose cost is within a constant factor of the log-interleave bound of any permutation ?
LL/SC and Atomic Copy: Constant Time, Space Efficient Implementations Using Only Pointer-Width CAS
When designing concurrent algorithms, Load-Link/Store-Conditional (LL/SC) is
often the ideal primitive to have because unlike Compare and Swap (CAS), LL/SC
is immune to the ABA problem. However, the full semantics of LL/SC are not
supported by any modern machine, so there has been a significant amount of work
on simulations of LL/SC using Compare and Swap (CAS), a synchronization
primitive that enjoys widespread hardware support. All of the algorithms so far
that are constant time either use unbounded sequence numbers (and thus base
objects of unbounded size), or require space for LL/SC object
(where is the number of processes). We present a constant time
implementation of LL/SC objects using space, where is
the maximum number of overlapping LL/SC operations per process (usually a
constant), and requiring only pointer-sized CAS objects. Our implementation can
also be used to implement -word objects in time (for
both and ) and space. To achieve these bounds, we
begin by implementing a new primitive called Single-Writer Copy which takes a
pointer to a word sized memory location and atomically copies its contents into
another object. The restriction is that only one process is allowed to
write/copy into the destination object at a time. We believe this primitive
will be very useful in designing other concurrent algorithms as well
Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model
In this paper we develop optimal algorithms in the binary-forking model for a
variety of fundamental problems, including sorting, semisorting, list ranking,
tree contraction, range minima, and ordered set union, intersection and
difference. In the binary-forking model, tasks can only fork into two child
tasks, but can do so recursively and asynchronously. The tasks share memory,
supporting reads, writes and test-and-sets. Costs are measured in terms of work
(total number of instructions), and span (longest dependence chain).
The binary-forking model is meant to capture both algorithm performance and
algorithm-design considerations on many existing multithreaded languages, which
are also asynchronous and rely on binary forks either explicitly or under the
covers. In contrast to the widely studied PRAM model, it does not assume
arbitrary-way forks nor synchronous operations, both of which are hard to
implement in modern hardware. While optimal PRAM algorithms are known for the
problems studied herein, it turns out that arbitrary-way forking and strict
synchronization are powerful, if unrealistic, capabilities. Natural simulations
of these PRAM algorithms in the binary-forking model (i.e., implementations in
existing parallel languages) incur an overhead in span. This
paper explores techniques for designing optimal algorithms when limited to
binary forking and assuming asynchrony. All algorithms described in this paper
are the first algorithms with optimal work and span in the binary-forking
model. Most of the algorithms are simple. Many are randomized
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
This paper considers issues of memory performance in shared memory multiprocessors that provide a high-bandwidth network and in which the memory banks are slower than the processors. We are concerned with the effects of memory bank contention, memory bank delay, and the bank expansion factor (the ratio of number of banks to number of processors) on performance, particularly for irregular memory access patterns. This work was motivated by observed discrepancies between predicted and actual performance in a number of irregular algorithms implemented for the cray C90 when the memory contention at a particular location is high. We develop a formal framework for studying memory bank contention and delay, and show several results, both experimental and theoretical. We first show experimentally that our framework is a good predictor of performance on the cray C90 and J90, providing a good accounting of bank contention and delay. Second, we show that it often improves performance to have addi..
- …